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Abstract 

Background: The metabolic syndrome (MetS), a complex disorder involving hypertension, obesity, dyslipidemia 
and insulin resistance, is a major risk factor for heart disease, stroke, and diabetes. The Lyon Hypertensive (LH), Lyon 
Normotensive (LN) and Lyon Low-pressure (LL) rats are inbred strains simultaneously derived from a common 
outbred Sprague Dawley colony by selection for high, normal, and low blood pressure, respectively. Further studies 
found that LH is a MetS susceptible strain, while LN is resistant and LL has an intermediate phenotype. Whole 
genome sequencing determined that, while the strains are phenotypically divergent, they are nearly 98% similar at the 
nucleotide level. Using the sequence of the three strains, we applied an approach that harnesses the distribution of 
Observed Strain Differences (OSD), or nucleotide diversity, to distinguish genomic regions of identity-by-descent (IBD) 
from those with divergent ancestry between the three strains. This information was then used to fine-map QTL identified 
in a cross between LH and LN rats in order to identify candidate genes causing the phenotypes. 

Results: We identified haplotypes that, in total, contain at least 95% of the identifiable polymorphisms between the Lyon 
strains that are likely of differing ancestral origin. By intersecting the identified haplotype blocks with Quantitative Trait 
Loci (QTL) previously identified in a cross between LH and LN strains, the candidate QTL regions have been narrowed by 
78%. Because the genome sequence has been determined, we were further able to identify putative functional variants in 
genes that are candidates for causing the QTL. 

Conclusions: Whole genome sequence analysis between the LH, LN, and LL strains identified the haplotype structure 
of these three strains and identified candidate genes with sequence variants predicted to affect gene function. This 
approach, merged with additional integrative genetics approaches, will likely lead to novel mechanisms underlying 
complex disease and provide new drug targets and therapies. 
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Background 

Metabolic Syndrome (MetS) is a constellation of disorders 
which include obesity, insulin resistance or hyperglycemia, 
dyslipidemia and hypertension, the combination of which 
have been found to significantly increase the risk for cardio- 
vascular disorders and type II diabetes [1]. According to data 
compiled by the National Health and Nutrition Examination 
Survey in 2009, more than one-third of the U.S. population 
falls into the criteria for metabolic syndrome [2], making it a 
major public health issue. Diagnosis of MetS is made with 
the co-occurrence of any three of the defining features [1]. 
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While the associated features often occur together and have 
clear genetic contribution, the common pathways or mecha- 
nisms linking them in MetS is not well understood. 

Identification of the genetic contribution to complex 
disease is greatly aided by comprehensive studies involving 
genetic models. The Lyon inbred rat strains were derived 
in the early 1970s from a single outbred Sprague- Dawley 
(SD) colony for different blood pressure levels: hyperten- 
sion (Lyon Hypertensive; LH/Mav), normotension (Lyon 
Normotensive; LN/Mav) and hypotension (Lyon Low- 
pressure; LL/Mav) [3], While LN rats have normal blood 
pressure, LL rats have late onset hypotension while LH 
rats are spontaneously hypertensive by 5 weeks of age 
[4,5]. Initially established as a model of hypertension, sev- 
eral defining features of the metabolic syndrome (MetS) 
have also been observed in LH [1,6]. These include obes- 
ity, dyslipidemia with an increase in total triglycerides, 
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total cholesterol, and increased insulin and insulimglucose 
ratio, which suggests a susceptibility to insulin resistance 
[4,6,7]. Therefore the LH rat is a MetS susceptible rat. The 
study of the Lyon strains, having differing genetic suscepti- 
bilities to traits defining MetS, can be used to dissect the 
underlying genetic causes of the defining features of a dis- 
order that carries a significant health burden [8,9]. 

We previously identified quantitative trait loci (QTL) 
for phenotypes defining MetS in an F2 intercross be- 
tween LH and LN rats, including body weight, blood 
pressure, plasma lipid levels, and plasma insulin levels 
[10]. While many of the traits were influenced by QTL 
on different chromosomes, this study determined that 
rat chromosome (RNO) 17 contains QTLs for multiple 
features of MetS (body weight; blood pressure; plasma 
cholesterol, triglyceride, and insulin levels). While blood 
pressure and plasma lipid levels were correlated in the 
F2 cross, body weight was not found to be correlated 
with either of these traits [6], suggesting the QTL on 
RNO 17 for body weight may have been due to the co- 
segregation of a passenger locus during selection rather 
than the pleiotropic effect of a single MetS gene on this 
chromosome. 

Because the inbred strains were derived from a single 
SD colony, the Lyon strains share high genetic similarity. 
Phylogenetic studies consistently find the LH, LN, and 
LL strains in a well-defined cluster of SD-derived inbred 
rat strains [11-13]. The shared lineage between LH and LN 
strains also resulted in a paucity of informative poly- 
morphic markers between the strains; therefore, the QTL 
intervals in our previous mapping study were large, and 
generating congenic and consomic strains by marker- 
assisted selection was a challenge. Consomic strains intro- 
gressing the more genetically divergent BN chromosomes 
13 or 17 succeeded in recapitulating some of the pheno- 
types - body weight, triglycerides, and blood pressure - 
that were identified in the QTL analysis [14,15]. However, 
the genetic similarity between the Lyon strains presents an 
opportunity to utilize haplotype mapping to fine-map the 
loci, if sufficient polymorphic markers could be identified. 

In 2007, the STAR Consortium released genotypes for 
163 inbred rat strains, including the LH and LN strains, 
from a 20,238-SNP panel [12]. As was previously deter- 
mined using microsatellite markers [11], phylogenetic ana- 
lyses for the rat strains using the 20 K SNP panel indicated 
a close genetic relationship between the LH and LN 
strains. Of the 20,238 SNPs in the panel, only 1,739 
(8.59%) are polymorphic between LH and LN. Further- 
more, the variants clustered into what could be considered 
putative LD blocks. We assert the genetic determinants for 
the LH phenotypes reside in LD blocks that differ between 
the strains, due to artificial selective sweeps from the SD 
progenitors. Yet, like any SNP genotyping panels, the 
STAR Consortium panel, determined by an ascertainment 



panel consisting of SS/Jr, GK/Ox, SHRSP/Bbb, WKY/Bbb 
and F344/Stm strains [12], is subject to the ascertainment 
biases observed in SNP panels in general [16] that can im- 
part large effects on many metrics of linkage disequilib- 
rium [17]. Resequencing of the genomes eliminates SNP 
genotyping biases and allows for more accurate LD ana- 
lyses; however until recently only a few rat strains had 
available genome sequence: BN/SsNHsD [18], SHR/ 
Olalpcv [19], and SD. 

We previously determined the single nucleotide poly- 
morphism (SNP) density across the genomes of the SHR 
and BN strains as a means to visualize the substantial di- 
versity between the two strains [19]. When plotting the 
genome- wide distribution of SNPs between the strains, 
we observed a bimodal distribution with one peak in the 
distribution curve having a low SNP density and the 
other having a high SNP density [19]. The Observed 
Strain Differences (OSD), or the density of variants be- 
tween two strains across a fixed genome sequence win- 
dow size, represent a local measure of polymorphic sites. 

Recently we published the genome sequences of 27 
different inbred rat strains including the LH, LL, and LN 
strains [13]. In this study we reported data regarding 
artificial selective sweeps among the rat strains, and sug- 
gest that shared genetic material between strains origin- 
ating from the same founder population, irrespective of 
their phenotype, reflects their common ancestry. Con- 
sidering LH and LN rats were generated through select- 
ive breeding from a common origin, we assert the 
regions with low SNP density are likely regions of shared 
lineage while the regions with high density would likely 
to be from different ancestral chromosomes that contain 
genetic determinants of their phenotypes due to artificial 
selection from the founder outbred SD rats. As reported 
here, OSD analysis was performed in the Lyon rat 
strains in order to fine-map the QTL, particularly on 
RNO 17, and identify candidate genes relating to MetS in 
the LH rat by comparing sequence variation in this 
strain to that of the other Lyon strains. 

Results 

Genome-wide Observed Strain Difference (OSD) analyses 

For the OSD analyses, six comparisons were performed 
in two groups. First, each of the three Lyon strains was 
compared with the BN reference genome (LH/BN; LN/ 
BN; LL/BN). Second, all possible pairwise comparisons 
between the Lyon strains (LH/LN; LH/LL; LL/LN) was 
performed to identify regions of the genomes between 
the strains with ancestrally distinct haplotypes derived 
from the outbred SD rats. 

In all comparisons (Figure 1), the OSD distribution of 
the 27,199 lOOKb-windows spanning the rat genome is 
bimodal, as was previously reported in the comparison 
between SHR and BN strains [19]. The first (left) peak in 
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Figure 1 Distributions of Observed Strain Difference (OSD) over 100Kb windows. OSD distribution is represented as the curve of kernel 
density estimates (Y axis) against OSD (X axis). The scale of the Y-axis is square-root transformed. The Polymorphism Enrichment Thresholds (PET) 
for each comparison is marked with a vertical line. 



the bimodal distribution contains regions of the genome 
identical by descent, with OSD values close to zero (i.e. 
low SNP density). The second (right) peak in the bi- 
modal distribution contains regions of the genome that 
are ancestrally divergent between the two strains, having 
high OSD values (i.e. high SNP density). A distinct valley 
separates the two peaks; we define the OSD value at this 
valley as the Polymorphism Enrichment Threshold (PET). 
The average PET in the Lyon vs. BN and the pairwise 
Lyon strain comparisons is 4.5 x 1(T 4 and 3.7 x 10" 4 , re- 
spectively (Table 1). Regions with SNP density values 
higher than the PET represent the windows within ances- 
tral haplotype blocks that differ between the strains. 

Comparing the SNP densities between the groups of 
comparisons, distinct differences in the nature of the 
distribution curves were observed (Figure 1). While all 
comparisons show a bimodal distribution, the number of 
windows with low SNP density (and accordingly low 
OSD values) is approximately 4-fold higher in the Lyon 
pairwise group than in the Lyon vs BN group. Con- 
versely, the number of windows with high SNP density 
(high OSD values) is over 3-fold lower in the Lyon pair- 
wise groups compared to the Lyon vs BN groups. This 



trend is consistent with the fact that the Lyon strains are 
evolutionarily close to each other but evolutionary dis- 
tant from the BN strain [12]. It also explains the low 
amount of polymorphism between the Lyon strains as 
compared to the Lyon vs BN comparisons (Table 1). The 
percentage of 100Kb windows with high SNP density in- 
creases from an average of 14.40% in Lyon pairwise 
comparisons to 66.44% in Lyon vs BN comparisons. 

In order to determine haplotype blocks between the 
strains being compared, adjacent windows with SNP 
density exceeding the PET were concatenated. There 
were 3-fold more divergent haplotype blocks in the Lyon 
vs BN comparisons compared to the Lyon pairwise com- 
parisons, with an average of 1,408 in Lyon vs BN groups 
compared to an average of 441 in Lyon pairwise groups 
(Table 1). Furthermore, the divergent haplotypes in the 
Lyon strains comparisons were on average less than 0.9 
Mb in length, whereas the Lyon vs BN haplotypes were 
nearly 50% longer, with an average of over 1.3 Mb. To- 
gether, these data are consistent with the breeding his- 
tory of the Lyon rat strains. 

The haplotype blocks were then aligned to the refer- 
ence BN sequence to determine their distribution in the 
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Table 1 Summary Statistics for OSD analyses for six strain comparisons 
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rat genome. Regardless of the pairwise comparison, the 
distribution of haplotype blocks across the genome was 
highly variable. For example, in the LH/LN compari- 
son (Figure 2, Additional file 1: Table SI), approxima- 
tely 15.5% of the genome contains divergent haplotype 



blocks. In comparison, nearly 31% of chromosomes 2, 10 
and 12 are comprised of divergent haplotype blocks, 
while only approximately 5% of chromosomes 7, 14 and 
20 encompass divergent haplotype blocks. The latter 
three chromosomes also have long stretches of 50 Mb or 
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more where there is no window exceeding the PET, that 
is, regions that are shared ancestrally. 

Because of the phenotype-driven selection of the Lyon 
strains from a common SD ancestor, it is likely that di- 
vergent haplotypes arising from artificial selective sweeps 
will contain variants causing the phenotypic differences 
between the strains. In order to fine-map QTL intervals 
for MetS traits previously mapped in a cross between 
LH and LN rats, we aligned both the haplotype blocks 



and QTL onto the rat genome and determined where 
the two overlap [10]. Using the genomic coordinates 
provided by the Rat Genome Database [20], the QTL in- 
tervals cover a total of -860 Mb bp, or 33% of the entire 
rat genome (Figure 3a). However, only 21% of these in- 
tervals (183 Mb) contain haplotypes differing between 
LH and LN strains. Therefore, these studies allow for in 
silico fine-mapping of QTL intervals, narrowing them by 
nearly 80%, and particularly on the chromosomes with 



LH/LN QTLs (Bilusic et al. (2004)) 
LH/LN Haplotypes 



13 



II 



1.3 



1 2 3 4 5 



7 8 9 10 11 12 13 14 15 16 17 18 19 20 X 




Figure 3 The overlap between previously reported QTLs and the divergent haplotype blocks between LH and LN. (A): Genome-wide 
comparison; (B): A focus on the chromosomes 1, 2, 3, 5, 7, 10, 13 and 17 that contains at least one QTL identified by Bilusic et al. In both figures 
red marks QTL intervals identified by Bilusic et al. and blue marks intervals containing divergent haplotype blocks. Idiograms were drawn using 
Idiographica [21]. 
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relatively few divergent haplotypes such as chromosomes 
7 and 17 (Figure 3b, Additional file 1: Table SI). 

Patterns of Haplotype Blocks on RN017 

Despite strong evidence that RN017 has genetic deter- 
minants contributing to multiple symptoms of MetS, the 
paucity of markers polymorphic between LH and LN 
presents a particular challenge to fine-map the genetic 
loci on this chromosome. Therefore, here we applied the 
OSD-based approach to RN017 to fine-map the genetic 
loci identified in the cross between LH and LN rats 
(Figure 4). When comparing the Lyon vs. BN groups to 
the Lyon pairwise groups, it is clear that the majority of 
the chromosome is divergent in the Lyon vs. BN com- 
parison with only small haplotype blocks in common, 
while the vast majority of RN017 is conserved among 
the Lyon strains. We identified 14 haplotype blocks on 
RN017 that differ between the LH and LN strains 
(Table 2). The span of these blocks cover 7.5 Mb, or 
7.7% of the chromosome, and contain 11,852 of 12,175 
SNPs (97.3%) between LH and LN rats on this chromo- 
some identified by resequencing (Table 2, Additional file 
1: Table SI). The percentage of RN017 representing an- 
cestrally different haplotype blocks are half of the genomic 
average of 15.4%, further demonstrating the similarity be- 
tween LH and LN strains on this chromosome. 

The LH and LN strains have previously undergone 
genome-wide SNP genotyping by the STAR consortium 
[12]. From these genotyping results we deduced a list of pu- 
tative haplotype blocks on RN017 and compared them to 
the OSD-based results (Table 2). The haplotype blocks 
identified by both approaches are largely similar, with both 
identifying blocks at 29-30 Mb, 39 Mb, 42-43 Mb, 62-65 
Mb, 69-70 Mb and 91 Mb. However, the present approach 
identified three novel putative haplotype regions at 30.7- 
30.9 Mb, 53.4-53.8 Mb, and 83.6-83.9 Mb. In addition, 
while both approaches identified a haplotype block ending 
at approximately 43.1 Mb, the start site of the block as 
identified by OSD analysis extends the 5 'end by approxi- 
mately 500 Kb compared to the one identified by SNP 
genotyping (41.7 vs 42.2 Mb, respectively), making the 
block about 47% longer. On the other hand, SNP genotyp- 
ing identified a 1.6 Mb haplotype block spanning 38.2-39.8 
Mb, while OSD analysis refined this block to 0.2 Mb (39.4- 



39.6 Mb), which can largely be attributed by the full map 
resolution provided by resequencing. Overlaying the haplo- 
types with the mapped QTL implicate blocks 1-12 as most 
likely to contain causal genes for the mapped traits. 

Genes and SNVs located in Haplotype Blocks 

Using the OSD analysis to identify ancestrally different 
haplotypes allows us to focus initial efforts identifying 
causal genes for the QTL in the LH rat. The 477 haplo- 
types divergent between LH and LN contain 3,687 
protein-coding genes; 1,789 of these genes fall within one 
or more of the previously identified QTLs [10]. The rese- 
quencing of the Lyon strains identified 643,234 SNPs and 
327,067 indels across the genome in the LH/LN compari- 
son, of which 630,814 and 235,414 are located in the 
haplotype blocks [13]. Genome-wide, there are 2,391 SNPs 
and 542 indels in the LH/LN comparison that Variant Ef- 
fect Predictor (VEP) [22] classified as causing non- 
synonymous coding, frameshift, splice site changes, and/ 
or stop codon gain/loss. Nearly all of these are located in 
the haplotype blocks, including 2,083 SNVs and 383 indels 
in 1,316 genes. Overlaying these SNVs and indels with 
QTL regions identified 416 genes with putative functional 
variation between the LH and LN strains. 

On chromosome 17, there are 27 protein- coding genes 
located within the haplotype blocks differing between LH 
and LN strains (Table 3). All except 2 of these genes fell 
within one or more of the previously reported QTLs asso- 
ciated with LH phenotypes (Figure 3b) [10]. We have 
identified 24 SNVs and 7 indels in 15 genes on RN017 
classified as affecting protein sequence, or splice sites, by 
VEP (Table 3). Each of these variants fell within one of the 
haplotype blocks differing between LH and LN strains. Of 
the 31 variants, 18 variants in 11 genes were the minor al- 
lele in the LH rat, and were colocalized with MetS QTL. 
There were three genes (RGD1 563300, Prl5a2, and 
Prl4al) with LH variants affecting splice sites and three 
genes (Prl4al, ENSRNOG000000 12418, and LOC364753) 
with variants that were classified as "probably damaging" 
or "possibly damaging" by PolyPhen 2 version 2.2.2 [23]. 

To interrogate the SNVs' possible roles in MetS traits, 
Fisher s exact test was performed to test whether the LH 
allele SNVs listed in Table 3 are significantly enriched 
among the sequenced rat strains [13] that have one or 
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Figure 4 Divergent haplotype blocks of different comparisons on RN017. From top: LH/BN, LL/BN, LN/BN, LH/LL, LH/LN and LL/LN. At the 
bottom the LH/LN haplotype blocks identified by SNP genotyping was added as reference. 
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Table 2 Divergent haplotype blocks between LH and LN 
strains identified by OSD analysis and SNP genotyping data 
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All genome position coordinates are based on the rn4 assembly. 



more symptoms of MetS: obesity, dyslipidemia and 
hypertension. One variant in LOC364753 (17: 
G65,701,876 T) showed significant enrichment at p < 
0.05; it was found to be enriched (p = 0.01) among the 
hypertensive LH, SS and SHR strains. 

Variant confirmation 

To verify the existence of SNVs within the haplotype 
blocks on RN017, we performed Sanger sequencing of 6 
amplicons containing 10 of the variants listed in Table 3 
(Additional file 2: Table S2). These six amplicons gener- 
ated a total of 3,848 base pairs of sequence. All 10 vari- 
ants were validated by Sanger sequencing, with the LH 
and LN allele identical to genome resequencing results. 
Furthermore, we were able to verify 20 of the 23 SNVs 
that were annotated in the genome sequence and identi- 
fied an additional SNV that was not previously anno- 
tated. These results reflect the high quality of the 
genome sequence of the strains. 

Discussion 

In this paper we report a simple technique to distinguish 
genomic regions of identity-by-descent (IBD) from those 
with different ancestry using genome resequencing re- 
sults from a group of rat strains that shares a common 
origin but were selectively inbred for differing pheno- 
types. Genetic studies in phenotype-selected inbred ro- 
dent strains derived from a common ancestor are a 
common strategy to map loci for many complex disor- 
ders, ranging from anxiety [24,25] to hypertension [26]. 



The similar genetic background strains minimizes the 
heterogeneity outside of the regions phenotypically se- 
lected, making identity-by-descent (IBD) mapping a 
means to eliminate disease-causing regions of the gen- 
ome. However, their similar genetic backgrounds also 
present problems to the investigator, as their similarities 
result in a paucity of polymorphic markers available to 
attain an acceptable marker resolution for mapping. 
Using next-generation sequencing (NGS) techniques to 
resequence the genomes of these strains can resolve this 
problem as NGS, by definition, samples all bases, and 
hence should be able to identify all polymorphisms be- 
tween strains, allowing high-resolution IBD mapping. In 
the case of the Lyon strains, which share similar SD ances- 
tors, we distinguished ancestral haplotypes that have been 
fixed in the course of selective inbreeding from the ran- 
dom mutations that were fixed after the division of the 
strains in order to fine-map QTL for traits defining MetS. 

The results presented here confirm previous data re- 
garding the genomes of different laboratory mouse 
strains, which also observed bimodal distributions of 
SNP densities in non-overlapping windows across the 
genome [27,28]. By resequencing a selection of putative 
SNPs from each peak, Wade et al. found that SNPs iden- 
tified in the low SNP density regions are likely to be 
spurious, while those identified on the high SNP density 
regions are likely to be validated [27]. Furthermore, by 
comparing the distribution of nucleotide diversity (tt) 
[29] among synonymous SNPs in cDNA transcripts in 
laboratory mouse strains, wild- derived mouse strains 
(control for high diversity), as well as a rat strain from a 
single founder (control for low diversity), Reuveni and 
colleagues assert that the bimodal distribution of tt in la- 
boratory mice is contributed by two groups of SNPs: 
intra-subspecific SNPs and inter-subspecific SNPs, rep- 
resented by the low tt and high tt peaks respectively [30]. 
While that paper mainly discusses mouse subspecies, we 
expect the implication can also be extrapolated to strain 
differences. In this case, SNPs that were represented by 
the low- tt peak (or the low-OSD peak in this case) are 
likely to be SNPs that arise after the separation of the 
strains, while SNPs that were represented by the high- tt 
peak, or high-OSD peak, represents SNPs that originate 
from the genetic differences between the founder strains. 

The observed bimodal distribution of SNP density has 
previously been reported by Wang et al. in a similar 
comparison between the indica and japonica subspecies 
of rice, using microarray genotyping and a window size of 
200 kB [31]. Furthermore, whole-genome resequencing 
between individual strains within the indica subspecies 
showed similar results as lower-density SNP typing [32]. 

The evolutionary histories of rice the Lyon strains are 
different. However, artificial selection from a single ori- 
gin was put forward by the authors as an explanation for 



Table 3 Genes and non-synonymous variations in LH vs LN haplotype blocks on RN017 



Gene name 


Gene start (bp) 
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zy,/D/poo 
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N-acetyllactosaminide beta-1 ,6-N- 
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A1 31T 


LN 
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acetylglucosaminyl-transferase 










Block 2 (30.1-30. Mb) 
















LOO 00362620 


30,267,398 


30,267,637 


CDC28 protein kinase regulatory subunit 2 


G30,267,440A 


E15K 


LN 


Benign 










T30,267,495C 


L33P 




Benign 










G30,267,526C 


W43C 




Benign 










G30,267,578 T 


E61* 




N/A 


Blocks (41.7-43.1 Mb) 
















RGD1 563300 


42,228,845 


42,229,431 


Similar to 60S ribosomal protein L29 (P23) 


g.42299020_42299027delACTCCGGT 




LH 


Essential splice site 










g.42299028_42299029insCACAAAGATA 


X29fs 


LN 


Frameshift 


PrlRnl 


Al Q9.A Q3Q 


Al QQ1 17^ 
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Prolactin family 5, subfamily a, member 2 


CA1 QRQ A7Ak 


P1 A\ 
r 1 4L 
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/I 0 17 C 11/1 

43,2/0,2 14 


43,284, 1 52 


Prolactin-4A1 


(j4i,Z/o,ZOO I 


T141 N 


LH 


Splice site, possibly 
















damaging 


Block 7 (53.4-53.8 Mb) 
















Stard3nl 


53,402,078 


53,436,081 


MLN64 N-terminal domain homolog 










ENSRNOG00000027571 


53,441,303 


53,484,317 


Uncharacterized protein 


G53,441,394A 


T161I /T313I* 


LH 


Benign 










C53,463,467 T 


V124I 


LH 












g.53483901_53483997del 


73_105del 


LN 


Frameshift 


ENSRNOG00000012418 


53,496,423 


53,528,130 


Uncharacterized protein 


G53,527,707 T 


P81T 


LH 


Possibly damaging 










T53,527,779A 


T57S 


LH 


Benign 










G53,528,005A 


P15S 


LH 


Probably damaging 










G53,528,025 T 


A8D 


LH 


Possibly damaging 


Amph 


53,558,804 


53,802,936 


Amphiphysin 


C53,558,811A 


R632L 


LH 


Benign 










g.53641892_53641893delCT 


c.152_153delAG 


LN 





Table 3 Genes and non-synonymous variations in LH vs LN haplotype blocks on RN017 (Continued) 
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inhibitor homolog 










RGD1564129 62,684,244 


62,686,747 


Uncharacterized protein 










Cu/2 62,701,289 


62,741,344 


Cullin-2 










Crem 62,770,633 


62,837,668 


cAMP-responsive element modulator 










£pc7 63,041,415 


63,104,046 


Enhancer of polycomb homolog 1 


A63, 1 02,600 T 


L55H 


LH 


Benign 


Block 9 (63.4-64.9 Mb) 














RnhlR A3 4Q7Q24 
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A63 528 108 T 


S1 93C 


LN 
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UCI "Mil 
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63 710 09Q 
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P301 L 
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Arrnarlilln rpnppit rnnt^ininn 4 
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A/?nn7 63 QQ? 387 
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n4 S87D77 


WW Hnmain rnnt^ininn ^Hantnr 

VVVV U 1 la 1 1 1 L-V^l 1 LQI 1 II 1 \VJ 0\JOVJ\.\J\ 

with coiled-coil 


T64 570 5676; 


C200G 


LH 


Rpnin n 

L)C I I'M'' 


Block 10 (65.0-65.9 Mb) 














LOC364753 65,681,793 


65,702,382 


similar to NSFL1 (p97) cofactor (p47) 


G65,701,876T 


G80C 


LH 


Possibly damaging 


Block 13 (83.6-83.9 Mb) 














EN5RNOG00000031981 83,837,499 


83,861,361 


Uncharacterized protein 


C83,860,804 T 


P40S 


LH 


Benign 








g.83861 1 07_83861 1 22delATCCCTGCATCCCTGC 


I141fs 


LN 


Frameshift 








g.83861 220_83861 227delCCCTGCAT 


T178fs 


LH 


Frameshift 








g.83837957_83837958insA 




LH 


Splice site 


Block 14 (90.8-91.1 Mb) 














Plxdc2 90,572,391 


90,982,073 


Plexin domain-containing protein 2 











Variants in bold were validated by Sanger sequencing. 
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the bimodal distribution of SNP density. Since the Lyon 
strains were in fact artificially selected from a single ori- 
gin based on their blood pressure, we consider the au- 
thors' conclusion about the relationship between the 
distribution of SNP densities and phenotype in rice can 
be applied to the Lyon rats. Specifically, the non-IBD re- 
gions between LH and LN contain genetic determinants 
for the divergence between LH and LN phenotypes. This 
approach has also been used among mammals, to iden- 
tify genomic regions that underlie the domestication of 
dogs using whole-genome sequence [33]. 

A caveat to our approach is the assumption that the 
phenotype differences between the Lyon rat strains are 
due to phenotype-driven selection of ancestrally different 
loci. While we cannot formally rule out that random mu- 
tation after divergence of the strains does have some 
phenotypic outcome, given the multigenic nature of the 
traits, we assert this approach will identify at least a subset 
of the disease-causing variants. In addition, we cannot 
confirm the method described in this paper is able to iden- 
tify all divergent haplotype blocks between two similar 
strains, particularly in genome regions lacking adequate 
coverage. However this approach is appropriate to priori- 
tize genetic loci that may contain genetic determinants for 
the phenotype in question which can be verified in vivo by 
using consomic and/or congenic strains [34]. 

In the Lyon pairwise comparisons, no more than 15% 
of the 100 Kb blocks on the genome have been identified 
as divergent haplotype blocks, yet these blocks contain 
more than 97% of all identifiable SNPs in the compari- 
sons (Table 1). Specifically, in the LH/LN comparison, 
divergent haplotype blocks encompass 420.0 Mb of the 
rat genome. QTL intervals mapped in a cross between 
the two strains encompass 827 Mb of the genome. Com- 
bining the QTL and haplotype mapping narrowed the 
loci by nearly 80% to 183 Mb [10], allowing a more re- 
fined focus for gene discovery. 

As mentioned previously, multiple QTL for MetS 
traits were mapped to RN017 in an LH x LN F2 inter- 
cross [10]. However, the QTL intervals span nearly the 
entire chromosome due to the relative low density of the 
genetic map. The approach reported here allowed for in 
silico fine-mapping of the QTL by narrowing the pos- 
sible candidate regions and thus reducing the number of 
candidate genes to 25. Of these, 11 are protein-altering 
variants in the LH rat, 5 of which are predicted to nega- 
tively impact function. Two prolactin genes (PrlSa2 and 
Prl4al) have variants predicted to be damaging in the 
LH rat. Interestingly, low serum prolactin levels have 
been reported to be associated with MetS in humans, 
both women and men [35,36]. Furthermore, plasma pro- 
lactin levels were found to be significantly decreased in 
the GK rat, an inbred model of type 2 diabetes. Interest- 
ingly in a cross between GK and BN rats, plasma 



prolactin levels were linked rat chromosome 17 in male 
rats [37]. Furthermore, the GK and LH rats share the same 
haplotypes for these genes, as do BN and LN strains. 
However, at this locus, the GK allele was actually associ- 
ated with higher plasma prolactin levels. Therefore the im- 
pact of the variants in these prolactin genes is unclear. 

The remaining three genes with predicted functional 
variants in the LH rat either had no known function 
{RGD1563300 and Loc3647S3) or no previously reported 
relationship with MetS, such as ENSRNOG00000012418, 
which has sequence similarity with T cell receptor 
gamma variable genes {TRGV). 

Other genes in the haplotype blocks may not have non- 
synonymous variants characterized as 'benign' by predic- 
tion software, but have been associated with symptoms of 
MetS in previous research. Haplotype block 1 contains 
RGD1562963, a rat ortholog of human C60RF52. The bo- 
vine ortholog of RGD1562963 falls within bovine QTL223, 
involving in beef marbling, i.e. the deposition of fat in bo- 
vine muscles [38,39]. This gene contains three non- 
synonymous SNPs in LH, albeit the prediction software 
categorized as the variants as being "benign." Amphiphysin 
(Amph) is a gene in haplotype block 7 with a nonsynon- 
ymous mutation causing a R632L amino acid change that 
is categorized as "benign" by PolyPhen2 in the LH rat 
While loss of function mutations in this gene are known 
to cause Stiff Person Syndrome [40], a SNP in Amph is 
also associated with sagittal diameter (a measure of central 
obesity) in the Framingham Heart Study 100 K dataset 
[41]. LH rats also have a nonsynonymous (C200G) muta- 
tion in Wac, a gene that may be essential in Golgi biosyn- 
thesis [42]. Interestingly, the variant in Wac is unique to 
LH and SS strains and could thus underlie their shared 
phenotypes of hypertension [13]. 

Finally, other genes in the haplotypes underlying the 
MetS QTL on LH chromosome 17, have no identified 
coding variants, but have notable function related to MetS. 
Blocks 8 to 10, separated by two 100Kb windows, includes 
several genes of note. Bambi is a protein that modifies 
TGF-beta signals by acting as a pseudo-receptor [43]. 
Knocking out Bambi in the mouse results in a weight de- 
crease in females [44] and an increase in arterial wall neo- 
vascularization [45]. Cul2 is part of the VHL tumor 
suppression complex that ubiquitinates HIFla [46]; the 
disruption of HIFla has been found to improve the insulin 
sensitivity and decrease adiposity in mice [47]. Also, muta- 
tions in another member of the cullin family, CUL3, have 
been found to cause some Mendelian forms of hyperten- 
sion [48]. Crem is an inducible CREB repressor whose 
down-regulation has been shown to contribute to insulin 
resistance in obese human and mice through the resulting 
increase in CREB expression [49], and mouse knockout 
models show protection against cardiopathy and left 
ventricular dysfunction, especially after exposure to 
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(31 -adrenergic agonists [50,51]. Finally, Mpp7, which has 
been determined to cause at least one case of Maturity- 
onset diabetes of the young (MODY) [52] and has been 
associated with left ventricular hypertrophy, BMI and inci- 
dence of cardiovascular diseases in the Framingham study 
[53-55]. Of note, this region falls in the peak of linkage for 
blood pressure and plasma lipid QTL previously mapped 
in the LH x LN intercross [10]. While these genes are in- 
teresting candidate genes, further studies are required, for 
example in congenic strains, to establish their roles in 
MetS. 

We could only identify one non-synonymous SNV on 
the haplotype blocks of RN017 that has the LH allele 
overrepresented among strains having MetS symptoms, a 
variant in LOC3647S3 (17:G65,701,876 T) that showed 
significant enrichment among the strains LH, SS and SHR. 
Interestingly, this is an LH variant predicted to be possibly 
damaging/ However, the variant T allele is actually con- 
served across vertebrates; therefore it is not likely to play a 
causal role in our phenotypes. Furthermore, many of these 
variants are rare, which may decrease the power of the 
Fishers exact test. For example, two of the non- 
synonymous variations in RGD1562963 are only observed 
in LH and SR/Jr among the sequenced strains (both de- 
rived from SD rats), and the variation causing non- 
synonymous mutations in Wac are only observed in LH 
and SS rats. While the coselection of genes common to 
hypertension is obvious in the LH and SS strains in the 
case of Wac, the shared alleles in RGD1562963 between 
LH and SR strains in relation to MetS is less obvious. 
While SR rats are commonly studied as a normotensive 
model of the salt-sensitive SS/Jr, they actually have ele- 
vated body weights compared to SS rats [56,57]. There- 
fore, while performing association studies in inbred strains 
may identify some genes for MetS, the heterogeneity of 
the phenotypes and their underlying causes complicate 
gene discovery. In fact, our analyses across multiple inbred 
rat strains that are models of hypertension, obesity, and 
dyslipidemia found no genes in common between all dis- 
ease strains [13]. Furthermore, because the traits defining 
MetS are multigenic traits in themselves, some risk alleles 
may be present in normal' strains but are insufficient to 
independently influence the phenotype. Therefore it is im- 
portant to have genetic data from QTL mapping studies 
or congenic strains to confirm the in silico findings. 

Conclusions 

We utilized the ancestral history of the selective inbreed- 
ing in the Lyon rat stains to identify LD blocks likely to 
harbor causal genes by analyzing the OSD distribution 
arising from the genome resequencing and overlaying 
them with QTL. Using this approach we have been able to 
identify a group of genes on RN017 that may contribute 
to the traits underlying MetS in the LH rat strain. 



The resequencing of several inbred rat strains includ- 
ing the Lyon strains provides a remarkable resource for 
identifying genes causing some of the most common hu- 
man diseases, such as metabolic syndrome and cardio- 
vascular disease. The sequence is the final component to 
round out integrative genetic approaches to identify 
novel MetS genes and we anticipate this resource will re- 
sult in the identification of many novel mechanisms of 
and therapies for one of the most common diseases of 
the 21st century. 

Methods 

Genome resequencing 

The genome sequence of the LH, LN, and LL rats was 
performed previously as described [13]. All animal pro- 
tocols were reviewed and approved by the IACUC at the 
University of Iowa. Briefly, DNA was extracted from the 
spleens of two individuals each from LH (LH/MavRrr- 
cAek), LL (LL/MavRrrcAek) and LN (LN/MavRrrcAek) 
strains, followed by 100 bp paired-end sequencing of 
300-600 bp fragments on an Alumina Hiseq 2000 plat- 
form as previously described [13]. Reads were then 
aligned to the RGSC-3.4 rat reference genome [18] with 
the Burrows -Wheeler Aligner version 0.5.8c [58]. The 
Genome Analysis Toolkit version 1.0.6001 [59,60] was 
then used to discover and genotype genomic variations. 
Variants were called from reads mapped with mapping 
quality greater than 10 and bases with base quality 
greater or equal to 17, with the variant scores thereafter 
recalibrated and filtered using GATICs GMM model 
[60]. Sequencing gaps were identified as regions of zero 
coverage from the output of BEDToolss [61] genomecov 
function (Additional file 3: Table S3). 

OSD analysis 

Observed Strain Differences (OSD) of non-overlapping 
100-kb windows across the genome were calculated as 
previously described [19]. OSD was defined as the num- 
ber of identified SNVs between the strains (where each 
strains genotype is homozygous) within a 100 kb win- 
dow divided by the number of nucleotides in that win- 
dow that have a definitive sequence call in all the strains 
being compared. For all comparisons, only positions that 
have passed quality control and are homozygous across 
all strains within the comparison were used in OSD 
calculation. 

The distribution density of OSD amongst all windows 
across the genome were first smoothed by binned kernel 
density estimate [62] as implemented by the R [63] pack- 
age KernSmooth with default parameters. This means esti- 
mating the kernel density on 401 equally spaced points 
with a Gaussian kernel and with bandwidth estimated by 
Wand and Jones' oversmoothed kernel selector. From the 
kernel smoothing results, a Polymorphism Enrichment 
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Threshold (PET) was determined, defined as the OSD 
value which is located in the local OSD minimum after 
the first local OSD maximum. Putative blocks of LD were 
generated by identifying and merging contiguous 100 kb 
windows with OSD values greater or equal to PET. These 
blocks represent haplotypes that differ between the strains 
being compared. 

Downstream analyses 

Genes that are located within the haplotype blocks were 
identified using Ensembl version 69 [64] gene annota- 
tions as provided by Ensembl BioMart [65] in the Rn4 
assembly. The effects of the identified SNVs and indels 
were predicted by Ensembls Variant Effect Predictor 
(VEP) [22] based on Ensembl version 69 data and using 
Ensembl consequence terms. For the purpose of this 
paper, non-synonymous variations are defined as varia- 
tions containing the term NON_SYNONYMOUS_COD- 
ING as the predicted consequence. Similarly, splice sites 
variants are defined by terms SPLICE_SITE and ESSEN- 
TIALS PLICE_SITE, frameshift variants by the term 
FRAMESHIFT_CODING, and stop-gained variation by 
the term STOP_GAINED from the VEP output. Poly- 
Phen version 2.2.2 [23] was used to predict the effects of 
SNPs identified as nonsynonymous by VEP, based on 
UniProt 2012_09 data [66]. 

The genotypes of SNVs located within the LH/LN 
haplotype blocks on RN017 that have been annotated to 
cause non-synonymous mutations or splice-site mutations 
among the rat strains sequenced by Atanur et al. [13] were 
obtained from Variant Visualizer within the Rat Genome 
Database [20]. Potential enrichment of the LH allele 
among the obese strains (LH, SBH, SS, SHR, LL and 
LEW), dyslipidemic strains (LH, SS and SHR) and hyper- 
tensive strains (LH, FHH, MHS, SBH, SHR, SHRSP and 
SS) against the other strains were statistically tested using 
two-tailed Fisher s exact test. In this analysis all substrains 
of BN were not used as they were considered identical to 
the reference sequence. In addition, the strain BBDP/Rhw 
was also not used out of concern that the Type I diabetes 
phenotype may be confounding. 

SNP genotyping-based haplotype blocks between LH 
and LN strains in RN017 were identified by visual in- 
spection for contiguous regions of polymorphism from 
the STAR SNP genotype panel [12]; the haplotype blocks 
are defined to be the regions between the flanking 
monomorphic SNPs surrounding the regions of poly- 
morphic SNPs. 

Variant confirmation 

Seven non-synonymous variants on the haplotype blocks 
on RN017 listed on Table 3 were confirmed by Sanger 
sequencing (Additional file 2: Table S2). Primers for 
these amplicons were designed by Primer-BLAST [67] using 



the region 1 Kb upstream and downstream of the variation 
as template, with M13 sequence (5'-TGT AAA ACG ACG 
GCC AGT-3') tagged at the 5' ends of the forward primer 
sequences and another M13 sequence (5'-GTG TGG AAT 
TGT GAG CGG -3') tagged to the 5' ends of the reverse 
primer sequences. Sequence was based on the rn4 assembly, 
with the exception of one amplicon. Because the flanking re- 
gion downstream of the variation at 17:43,278,266 contained 
a large stretch of gaps in the rn4 assembly, the sequencing 
primer set for this variation was designed using coordinating 
location in the rn5 assembly (17:40,575,021). 

PCR amplification was performed and products were 
purified by gel electrophoresis and then sequenced bidi- 
rectionally using the M13 primers listed above using ABI 
3730x1 sequencer with BigDye version 3.1 chemistry (Life 
Technologies). Sequence traces were aligned to the genome 
using SeqMan version 9.1.0 (DNASTAR Inc., Madison, 
WI, USA). SNVs were validated if both strains had se- 
quence passing QC and base-calling was unambiguous. 

Data deposition and availability of supporting data 

All sequence data was deposited in the EBI Sequence Read 
Archive with accession number ERP002160 (http://www. 
ebi.ac.uk/ena/data/view/ERP002160) as reported previ- 
ously [13]. Sequence variants are available at the Rat Gen- 
ome Database (RGD; http://rgd.mcw.edu/). 

Additional files 



Additional file 1: Table SI. Haplotype blocks identified between LH. 
and LN strains. 

Additional file 2: Table S2. Sanger sequence validation of selected 
variants. 

Additional file 3: Table S3. Sequence coverage in haplotype blocks. 
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